[R] str(data.frame) after subsetting reflects original structure, not subsetted structure?

Bryan Hanson hanson at depauw.edu
Fri Jul 24 16:16:04 CEST 2009


Thanks Marc and Ben...

Your answers were most helpful.

I suspected something had been written about it,  but was having trouble
formulating a reasonable search query.  I was looking in the help page for
str(), which was sort of a dead end.

Bryan
*************
Bryan Hanson
Professor of Chemistry & Biochemistry
DePauw University, Greencastle IN USA



On 7/24/09 9:46 AM, "Marc Schwartz" <marc_schwartz at me.com> wrote:

> On Jul 24, 2009, at 8:17 AM, Bryan Hanson wrote:
> 
>> I find that after subsetting (you may prefer "conditional
>> selection") a data
>> frame and assigning it to a new object, the str(new object) reflects
>> the
>> original data frame, not the new one:
>> 
>> A <- rnorm(20)
>> B <- factor(rep(c("t", "g"), 10))
>> C <- factor(rep(c("h", "l"), 10))
>> D <- data.frame(A, B, C)
>> 
>> str(D) # reports correctly
>> 
>> E <- D[D$C == "h",]
>> 
>> str(E) # reports that D$C still has 2 levels, but
>> E # or E$C shows that subsetting worked properly
>> Summary(E) # shows the original structure and that subsetting worked
>> 
>> Is this the expected behavior, and if so, is there a particular
>> rationale?
>> I would be pretty certain that the information about E was inherited
>> from D,
>> but why wasn't it updated to reflect the revised object?  Is there an
>> argument that I can use to force the updating?
>> 
>> For better or worse, I use str() a lot to check my work, and in this
>> case,
>> it seems to have misled me.
>> 
>> Thanks as always, Bryan
> 
> See ?"[.factor" which is the extract (subset) method for factors. Note
> that the 'drop' argument is FALSE by default. It is this argument that
> controls the retention of unused factor levels.
> 
> The reason that it is FALSE by default is to ensure that if you are
> comparing factors from more than one data source, the comparisons of
> or the use of the factor levels are consistent.
> 
> For one approach to dropping unused factor levels from a data frame,
> see:
> 
>    
> http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels
> 
> HTH,
> 
> Marc Schwartz
>




More information about the R-help mailing list