[R] is this a bug?

Dennis Murphy djmuser at gmail.com
Sat Jun 18 12:21:33 CEST 2011


Hi:

It's also simpler to use transform() or within(), especially if you
want to create and/or modify multiple variables in a data frame. For
example,

df<- data.frame(weight=round(runif(10, 10, 100)),
                 sex=round(runif(100, 0, 1)))
df <- transform(df, pct = 100 * weight/ave(weight, sex, FUN = sum))
> head(df, 3)
  weight sex      pct
1     87   0 2.425425
2     31   1 1.025471
3     71   0 1.979370

HTH,
Dennis


On Sat, Jun 18, 2011 at 2:44 AM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> Thanks a lot to all who responded. This is a little less confusing now, although
> it's hard for me to fathom the (practical) use of a dataframe within a
> dataframe. If one mixes different notations, or, put in a different way,
> different underlying classes (data.frame vs. numeric), these rather unintuitive
> results appear.
> So I'll use any of these:
> df$pct <- df$weight / ave(df$weight, df$sex, FUN=sum)*100
> df["pct"] <- df["weight"] / ave(df["weight"], df["sex"], FUN=sum)*100
>
> using str() is very insightful, as is using class()
>
> I'd prefer it if R simply generated an error when one attempts to nest a
> data.frame within a data.frame.
>
> Thanks again!
>
>  Cheers!!
> Albert-Jan
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> All right, but apart from the sanitation, the medicine, education, wine, public
> order, irrigation, roads, a fresh water system, and public health, what have the
> Romans ever done for us?
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
> ________________________________
> From: Brian Diggs <diggsb at ohsu.edu>
> To: R-help at r-project.org
> Sent: Fri, June 17, 2011 11:58:44 PM
> Subject: Re: [R] is this a bug?
>
> On 6/17/2011 2:24 PM, (Ted Harding) wrote:
>> And the extra twist in the tale is exemplified by this
>> mini-version of Albert-Jan's first example:
>>
>>    DF<- data.frame(A=c(1,2,3))
>>    DF$B<- c(4,5,6)
>>    DF$C<- c(7,8,9)
>>    DF
>>    #   A B C
>>    # 1 1 4 7
>>    # 2 2 5 8
>>    # 3 3 6 9
>>
>>    DF$D<- DF["A"]/DF["B"]
>>    DF
>>    #   A B C    A
>>    # 1 1 4 7 0.25
>>    # 2 2 5 8 0.40
>>    # 3 3 6 9 0.50
>>
>> ##And why:
>>
>>    DF["A"]/DF["B"]
>>    #      A
>>    # 1 0.25
>>    # 2 0.40
>>    # 3 0.50
>>
>> ##So the ratio DF["A"]/DF["B"] comes out with the name of
>> ##the numerator, "A". This is then the name given to DF$D
>
> It's even slightly weirder than that:
>
> str(DF)
> #'data.frame':   3 obs. of  4 variables:
> # $ A: num  1 2 3
> # $ B: num  4 5 6
> # $ C: num  7 8 9
> # $ D:'data.frame':      3 obs. of  1 variable:
> #  ..$ A: num  0.25 0.4 0.5
>
> There is a column D in DF which is itself a data frame with a single
> column whose name is A (because of what Ted said).  When formatted for
> printing out, the column name of the inner data frame is used (as a
> result of how data.frame() itself handles named arguments when the
> argument is itself a data.frame: "If a list or data frame or matrix is
> passed to data.frame it is as if each component or column had been
> passed as a separate argument...").
>
> So not a bug, but a convoluted set of circumstances that can happen when
> non-atomic vectors are assigned to columns of a data.frame.  That's one
> of those /you shouldn't do that even though it is technically legal or
> at least you shouldn't be surprised when things don't work the way you
> thought they would/ things.
>
>> Thus Albert-Jan's
>>    df["weight"] / ave(df["weight"], df["sex"], FUN=sum)*100
>> comes through with name "weight".
>>
>> Ted.
>>
>>
>> On 17-Jun-11 21:06:42, William Dunlap wrote:
>>> df$varname is a column of df.
>>>
>>> df["varname"] is a one-column df containing that column.
>>>
>>> df[["varname"]] is a column of df (same as df$varname).
>>>
>>> df[,"varname"] is a column of df (same as df$varname).
>>>
>>> df[,"varname",drop=FALSE] is a one-column df (same as df$varname).
>>>
>>> df$newVarname<- df["varname"] inserts a new component
>>> into df, the component being a one-column data.frame,
>>> not the column in that data.frame.
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org
>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Albert-Jan Roskam
>>>> Sent: Friday, June 17, 2011 1:49 PM
>>>> To: R Mailing List
>>>> Subject: [R] is this a bug?
>>>>
>>>> Hello,
>>>>
>>>> Is the following a bug? I always thought that df$varname<-
>>>> does the same as
>>>> df["varname"]<-
>>>>
>>>>> df<- data.frame(weight=round(runif(10, 10, 100)),
>>>> sex=round(runif(100, 0,
>>>> 1)))
>>>>> df$pct<- df["weight"] / ave(df["weight"], df["sex"], FUN=sum)*100
>>>>> names(df)
>>>> [1] "weight" "sex"    "pct"     ### ---------->  ok
>>>>> head(df)
> [[elided Yahoo spam]]
>>>> 1     86   0 2.4002233
>>>> 2     19 1 0.5643006
>>>> 3     32   0 0.8931063
>>>> 4     87   0 2.4281328
>>>> 5     45   0 1.2559308
>>>> 6     95   0 2.6514094
>>>>> rm(df)
>>>>> df<- data.frame(weight=round(runif(10, 10, 100)),
>>>> sex=round(runif(100, 0,
>>>> 1)))
>>>>> df["pct"]<- df["weight"] / ave(df["weight"], df["sex"],
>>>> FUN=sum)*100 ###
>>>>> ----->  this does work
>>>>> names(df)
>>>> [1] "weight" "sex"    "pct"
>>>>> head(df)
>>>>    weight sex       pct
>>>> 1     15 0 0.5246590
>>>> 2     43   0 1.5040224
>>>> 3     17 1 0.9284544
>>>> 4     44   1 2.4030584
>>>> 5     76   1 4.1507373
>>>> 6     59   0 2.0636586
>>>>> do.call(c, R.Version())
>>>>                         platform                            arch
>>>>              "i686-pc-linux-gnu"                          "i686"
>>>>                               os                          system
>>>>                      "linux-gnu"               "i686, linux-gnu"
>>>>                           status                           major
>>>>                               ""                             "2"
>>>>                            minor                            year
>>>>                           "11.1"                          "2010"
>>>>                            month                             day
>>>>                             "05"                            "31"
>>>>                          svn rev                        language
>>>>                          "52157"                             "R"
>>>>                   version.string
>>>> "R version 2.11.1 (2010-05-31)"
>>>>> # Thanks!
>>>>
>>>> Cheers!!
>>>> Albert-Jan
>>>>
>>>>
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> All right, but apart from the sanitation, the medicine,
>>>> education, wine, public
>>>> order, irrigation, roads, a fresh water system, and public
>>>> health, what have the
>>>> Romans ever done for us?
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding)<ted.harding at wlandres.net>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 17-Jun-11                                       Time: 22:24:41
>> ------------------------------ XFMail ------------------------------
>>
>
>
> --
> Brian S. Diggs, PhD
> Senior Research Associate, Department of Surgery
> Oregon Health & Science University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list