[R] adding zeros to dataframe
wilkinsonmr at gmail.com
Fri May 1 22:58:50 CEST 2009
I interpreted your situation a little differently than the other
responses. Please ignore this if their suggestions solved your
I assumed you have abundance where available, but otherwise it wasn't
recorded--not as NA, just unrecorded. You want to fill in the missing
"rows" with zeros for abundance, for each treatment, for 272 plots
within treatment, for all possible species within a plot.
(I now see from your repost that this is the case.)
R code and comments follow.
## I'll try to reproduce some of your data. You can ignore this part
for your code.
## Say there are 5 treatments, 272 plots per treatment, and 10
## *possible* species
your.data <- expand.grid(treatment = c("A", "B", "C", "D", "E"),
plot.location = 1:272,
species = paste("s", 1:10, sep = ""))
your.data$abundance <- rpois(nrow(your.data), 3)
your.data <- your.data[sample(nrow(your.data), size = 100), ]
row.names(your.data) <- seq(nrow(your.data))
## Your data looks something like this:
## You need to generate all combinations of values of your variables
## Assuming all are currently represented somewhere in your data set,
(treatments <- unique(your.data$treatment[!is.na(your.data$treatment)]))
plot.locations <- 1:272 # or
(species <- unique(your.data$species[!is.na(your.data$species)]))
## The complete data with all species, for all locations, for all
## treatments, present is
complete.data <- expand.grid(tx = treatments, pl = plot.locations,
sp = species)
## Put the two together, with NA for unrecorded abundance
your.complete.data <- merge(complete.data, your.data,
by.x = c("tx", "pl", "sp"),
by.y = c("treatment", "plot.location", "species"),
all.x = TRUE)
## Fill in the NAs
your.complete.data$abundance[is.na(your.complete.data$abundance)] <- 0
Hope this helps,
On Fri, May 1, 2009 at 12:20 PM, Collins, Cathy <ccollins at ku.edu> wrote:
> I am new to R and am hoping to get some tips from experienced R-programmers.
> I have a dataset that I've read into R as a dataframe. There are 5 columns: Plot location,species name, a species number code (unique to each species name), abundance, and treatment. There are 272 plots in each treatment, but only the plots in which the species was recorded have an abundance value. For all species in the dataset, I would like to add zeros to the abundance column for any plots in which the species was not recorded, so that each species has 272 rows. The data are sorted by species and then abundance, so all of the zeros can presumably just be tacked on to the last (272-occupied plots) row for each species.
> My programming skills are still somewhat rudimentary (and biased toward VBA-style looping...which seems to be leading me astray). Though I have searched, I have not yet seen this particular problem addressed in the help files.
> Many thanks for any suggestions,
> <mailto:ccollins at ku.edu>
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help