[R] Subsetting dataframes
CG Pettersson
cg.pettersson at vpe.slu.se
Thu Jul 19 11:52:09 CEST 2007
Dear all!
W2k, R 2.5.1
I am working with an ongoing malting barley variety evaluation within
Sweden. The structure is 25 cultivars tested each year at four sites, in
field trials with three replicates and 'lattice' structure (the replicates
are divided into five sub blocks in a structured way). As we are normally
keeping around 15 varieties from each year to the next, and take in 10 new
for next year, we have tested totally 72 different varieties during five
years.
I store the data in a field trial database, and generate text tables with
the subset of data I want and import the frame to R. I take in all
cultivars in R and use 'subset' to select what I want to look at. Using
lme{nlme} works with no problems to get mean results over the years, but
as I now have a number of years I want to analyse the general site x
cultivar relation. I am testing AMMI{agricolae} for this and it seems to
work except for the subsetting. This is what happens:
If I do the subsetting like this:
x62_samvar <- subset(x62_5, cn %in%
c("Astoria","Barke","Christina","Makof", "Prestige","Publican","Quench"))
A test run with AMMI seems to work in the first part:
> AMMI(site, cn, rep, yield)
ANALYSIS AMMI: yield
Class level information
ENV: Hag Klb Bjt Ska
GEN: Astoria Prestige Makof Christina Publican Quench
REP: 1 2 3
Number of observations: 240
model Y: yield ~ ENV + REP%in%ENV + GEN + ENV:GEN
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
ENV 3 120092418 40030806 90.0424 1.665e-06 ***
REP(ENV) 8 3556620 444578 0.5674 0.803923
GEN 5 21376142 4275228 5.4564 9.680e-05 ***
ENV:GEN 15 28799807 1919987 2.4504 0.002555 **
Residuals 208 162973213 783525
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Coeff var Mean yield
13.08629 6764.098
After this something goes wrong, as AMMI finds a cultivar name not
selected in the subsetting. (The plotting might go wrong anyhow, but I
haven´t got that far yet):
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) :
factor 'y' has new level(s) Arkadia
Looking at the dataframe using
> edit(x62_samvar)
only shows the selected lines, but using levels() gives another answer as
> levels(x62_samvar$cn)
gives back all 72 cultivar names used during the five years (starting with
Arcadia).
Where do I go wrong and how do I use subset in a proper way?
Thanks
/CG
--
CG Pettersson, PhD
Swedish University of Agricultural Sciences (SLU)
Dept. of Crop Production Ecology. Box 7043.
SE-750 07 Uppsala, Sweden
cg.pettersson at vpe.slu.se
More information about the R-help
mailing list