[R] Subsetting dataframes
Uwe Ligges
ligges at statistik.uni-dortmund.de
Thu Jul 19 15:01:48 CEST 2007
CG Pettersson wrote:
> Dear all!
>
> W2k, R 2.5.1
>
> I am working with an ongoing malting barley variety evaluation within
> Sweden. The structure is 25 cultivars tested each year at four sites, in
> field trials with three replicates and 'lattice' structure (the replicates
> are divided into five sub blocks in a structured way). As we are normally
> keeping around 15 varieties from each year to the next, and take in 10 new
> for next year, we have tested totally 72 different varieties during five
> years.
>
> I store the data in a field trial database, and generate text tables with
> the subset of data I want and import the frame to R. I take in all
> cultivars in R and use 'subset' to select what I want to look at. Using
> lme{nlme} works with no problems to get mean results over the years, but
> as I now have a number of years I want to analyse the general site x
> cultivar relation. I am testing AMMI{agricolae} for this and it seems to
> work except for the subsetting. This is what happens:
>
> If I do the subsetting like this:
>
> x62_samvar <- subset(x62_5, cn %in%
> c("Astoria","Barke","Christina","Makof", "Prestige","Publican","Quench"))
>
> A test run with AMMI seems to work in the first part:
>
>> AMMI(site, cn, rep, yield)
>
> ANALYSIS AMMI: yield
> Class level information
>
> ENV: Hag Klb Bjt Ska
> GEN: Astoria Prestige Makof Christina Publican Quench
> REP: 1 2 3
>
> Number of observations: 240
>
> model Y: yield ~ ENV + REP%in%ENV + GEN + ENV:GEN
>
> Analysis of Variance Table
>
> Response: Y
> Df Sum Sq Mean Sq F value Pr(>F)
> ENV 3 120092418 40030806 90.0424 1.665e-06 ***
> REP(ENV) 8 3556620 444578 0.5674 0.803923
> GEN 5 21376142 4275228 5.4564 9.680e-05 ***
> ENV:GEN 15 28799807 1919987 2.4504 0.002555 **
> Residuals 208 162973213 783525
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Coeff var Mean yield
> 13.08629 6764.098
>
> After this something goes wrong, as AMMI finds a cultivar name not
> selected in the subsetting. (The plotting might go wrong anyhow, but I
> haven´t got that far yet):
>
> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
> object$xlevels) :
> factor 'y' has new level(s) Arkadia
>
>
> Looking at the dataframe using
>
>> edit(x62_samvar)
>
> only shows the selected lines, but using levels() gives another answer as
>
>> levels(x62_samvar$cn)
>
> gives back all 72 cultivar names used during the five years (starting with
> Arcadia).
>
> Where do I go wrong and how do I use subset in a proper way?
So you have to drop the levels you are excluding. Example:
x <- factor(letters[1:4])
x
x[1:2]
x[1:2, drop=TRUE]
Uwe Ligges
> Thanks
> /CG
>
More information about the R-help
mailing list