[R] aggregate() function, strange behavior for augmented data

David Afshartous dafshartous at med.miami.edu
Mon Jun 16 17:50:05 CEST 2008


Everything was read in the same way, and str(junk1) confirms that they are
the same structure.  This is very strange.

## original data:
> str(junk1)
'data.frame':   96 obs. of  3 variables:
 $ Hour: int  0 3 5 0 3 5 0 3 5 0 ...
 $ Drug: Factor w/ 2 levels "D","P": 2 2 2 1 1 1 2 2 2 1 ...
 $ Aldo: int  9 15 4 8 13 3 5 11 5 7 ...

## augmented data:
> str(junk1)
'data.frame':    108 obs. of  3 variables:
 $ Hour: int  0 3 5 0 3 5 0 3 5 0 ...
 $ Drug: Factor w/ 2 levels "D","P": 2 2 2 1 1 1 2 2 2 1 ...
 $ Aldo: int  9 15 4 8 13 3 5 11 5 7 ...






On 6/16/08 11:37 AM, "markleeds at verizon.net" <markleeds at verizon.net> wrote:

> 
> hi: do str(junk1) and it will tell you what  the components of junk1
> are.
> 
> the only thing i can think of is that you used stringsAsFactors=FALSE
> when you ( probably ) used read.table to read in junk but you didn't use
> that
> options when you used read.table  to read in junk1 ?
> 
> 
> On Mon, Jun 16, 2008 at 11:30 AM, David Afshartous wrote:
> 
>> All,
>> 
>> I'm re-running some analysis that has been augmented with additional
>> data.
>> When I use the exact same code for the augmented data, the behavior of
>> the
>> aggregate function is very strange, viz., one of the resulting
>> variables is
>> now coded as a factor while it was coded as numeric for the original
>> data.
>> Unfortunately, I cannot provide a reproducible code example since it
>> only
>> seems to occur with this data.  I've checked and re-checked the of
>> both the
>> original and augmented data but nothing appears inconsistent.  Any
>> suggestions much appreciated.  See below for specifics.
>> 
>> Cheers,
>> David
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> # original data
>>> dim(junk1)
>> [1] 96  3
>>> junk1[1,]
>>   Hour Drug Aldo
>> 1    0    P    9
>>> junk1$Hour
>>  [1] 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5
>> 0 3
>> 5 0 3
>> [39] 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3
>> 5 0
>> 3 5 0
>> [77] 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5   ### Not coded as a
>> factor
>>> junk1.mean.time.drug = aggregate(junk1[3], junk1[c(1,2)], mean)
>>> junk1.mean.time.drug$Hour
>> [1] 0 3 5 0 3 5  ### not coded as a factor
>> 
>> # augmented data
>>  dim(junk1)
>> [1] 108   3
>>> junk1[1,]
>>   Hour Drug Aldo
>> 1    0    P    9
>>> junk1$Hour
>>   [1] 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3
>> 5 0 3
>> 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3
>>  [51] 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0
>> 3 5 0
>> 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0
>> [101] 3 5 0 3 5 0 3 5    ### not coded as a factor
>>> junk1.mean.time.drug = aggregate(junk1[3], junk1[c(1,2)], mean)
>>> junk1.mean.time.drug$Hour
>> [1] 0 3 5 0 3 5
>> Levels: 0 3 5    ################## coded as a factor now!
>> 
>> ## of course, I get recode it again but I'm curious as to why this is
>> ## changing here
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list