[R] behavior of "by"

Jeff Laake Jeff.Laake at noaa.gov
Wed Oct 29 18:49:03 CET 2008


Again thanks for the input.  I've been a recipient of this list for 
quite a few years although I don't post often.  It is an invaluable 
resource and I appreciate the effort of all the contributors.  I support 
a lot of software so I know how much work it can be.

I've seen the "reproducible code" at the bottom of the messages but 
until I got an off-list explanation I had no idea that it meant an 
example in which the "code AND data can be copied and pasted directly 
into R".  I re-read the posting guide and it does suggest providing an 
example. Adding a definition for "reproducible code" might help.  I've 
always interpreted "reproducible" to mean an example that can be used to 
"reproduce" the error.  However, that is a bit of a catch-22 because in 
my case I couldn't reproduce it with a simple example because I didn't 
know what to reproduce. I could demonstrate it within my own code.  It 
didn't make sense until I used str() on the dataframe.  Sometimes what 
is needed is some insight about what to look for.   Now that I 
understand the problem, below is the "reproducible code and data"

df=data.frame(x=rep(1,3),y=tapply(1:9,factor(c(rep("A",3),rep("B",3),rep("C",3))),sum))
df
by(df$y,df$x,length)
tapply(df$y,df$x,length)

As suggested table(df$x) would have been the more tidy solution for what 
I wanted to do. 

regards --jeff


Sebastian P. Luque wrote:
> On Tue, 28 Oct 2008 18:04:57 -0700,
> Jeff Laake <Jeff.Laake at noaa.gov> wrote:
>
>   
>> Any insight into the behavior of "by" in the following case would be
>> appreciated.  There is a note in the help details for "by" about
>> documenting behavior since v2.7 but I don't entirely understand what
>> it is saying.  I'm using R2.7.2 Windows.  I'm interested if the
>> following behavior was a change or whether it has always worked this
>> way.  I looked at RSiteSearch and read through version changes but
>> found nothing.
>>     
>
>   
>> Take a dataframe as follows:
>>     
>>> samples
>>>       
>>   Region.Label Area Sample.Label Effort Label 1 1 10000 1 100 11 2 1
>> 10000 2 100 12 3 1 10000 3 100 13 4 1 10000 4 100 14 5 1 10000 5 100
>> 15 6 1 10000 6 100 16 7 1 10000 7 100 17 8 1 10000 8 100 18 9 1 10000
>> 9 100 19 10 1 10000 10 100 110
>>     
>
> I cannot reproduce your results (please provide reproducible code), but:
>
> table(samples$Region.Label)
>
> is simpler for this purpose.
>
>
>



More information about the R-help mailing list