[R] is this a bug?
Dennis Murphy
djmuser at gmail.com
Sat Jun 18 12:21:33 CEST 2011
Hi:
It's also simpler to use transform() or within(), especially if you
want to create and/or modify multiple variables in a data frame. For
example,
df<- data.frame(weight=round(runif(10, 10, 100)),
sex=round(runif(100, 0, 1)))
df <- transform(df, pct = 100 * weight/ave(weight, sex, FUN = sum))
> head(df, 3)
weight sex pct
1 87 0 2.425425
2 31 1 1.025471
3 71 0 1.979370
HTH,
Dennis
On Sat, Jun 18, 2011 at 2:44 AM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> Thanks a lot to all who responded. This is a little less confusing now, although
> it's hard for me to fathom the (practical) use of a dataframe within a
> dataframe. If one mixes different notations, or, put in a different way,
> different underlying classes (data.frame vs. numeric), these rather unintuitive
> results appear.
> So I'll use any of these:
> df$pct <- df$weight / ave(df$weight, df$sex, FUN=sum)*100
> df["pct"] <- df["weight"] / ave(df["weight"], df["sex"], FUN=sum)*100
>
> using str() is very insightful, as is using class()
>
> I'd prefer it if R simply generated an error when one attempts to nest a
> data.frame within a data.frame.
>
> Thanks again!
>
> Cheers!!
> Albert-Jan
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> All right, but apart from the sanitation, the medicine, education, wine, public
> order, irrigation, roads, a fresh water system, and public health, what have the
> Romans ever done for us?
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
> ________________________________
> From: Brian Diggs <diggsb at ohsu.edu>
> To: R-help at r-project.org
> Sent: Fri, June 17, 2011 11:58:44 PM
> Subject: Re: [R] is this a bug?
>
> On 6/17/2011 2:24 PM, (Ted Harding) wrote:
>> And the extra twist in the tale is exemplified by this
>> mini-version of Albert-Jan's first example:
>>
>> DF<- data.frame(A=c(1,2,3))
>> DF$B<- c(4,5,6)
>> DF$C<- c(7,8,9)
>> DF
>> # A B C
>> # 1 1 4 7
>> # 2 2 5 8
>> # 3 3 6 9
>>
>> DF$D<- DF["A"]/DF["B"]
>> DF
>> # A B C A
>> # 1 1 4 7 0.25
>> # 2 2 5 8 0.40
>> # 3 3 6 9 0.50
>>
>> ##And why:
>>
>> DF["A"]/DF["B"]
>> # A
>> # 1 0.25
>> # 2 0.40
>> # 3 0.50
>>
>> ##So the ratio DF["A"]/DF["B"] comes out with the name of
>> ##the numerator, "A". This is then the name given to DF$D
>
> It's even slightly weirder than that:
>
> str(DF)
> #'data.frame': 3 obs. of 4 variables:
> # $ A: num 1 2 3
> # $ B: num 4 5 6
> # $ C: num 7 8 9
> # $ D:'data.frame': 3 obs. of 1 variable:
> # ..$ A: num 0.25 0.4 0.5
>
> There is a column D in DF which is itself a data frame with a single
> column whose name is A (because of what Ted said). When formatted for
> printing out, the column name of the inner data frame is used (as a
> result of how data.frame() itself handles named arguments when the
> argument is itself a data.frame: "If a list or data frame or matrix is
> passed to data.frame it is as if each component or column had been
> passed as a separate argument...").
>
> So not a bug, but a convoluted set of circumstances that can happen when
> non-atomic vectors are assigned to columns of a data.frame. That's one
> of those /you shouldn't do that even though it is technically legal or
> at least you shouldn't be surprised when things don't work the way you
> thought they would/ things.
>
>> Thus Albert-Jan's
>> df["weight"] / ave(df["weight"], df["sex"], FUN=sum)*100
>> comes through with name "weight".
>>
>> Ted.
>>
>>
>> On 17-Jun-11 21:06:42, William Dunlap wrote:
>>> df$varname is a column of df.
>>>
>>> df["varname"] is a one-column df containing that column.
>>>
>>> df[["varname"]] is a column of df (same as df$varname).
>>>
>>> df[,"varname"] is a column of df (same as df$varname).
>>>
>>> df[,"varname",drop=FALSE] is a one-column df (same as df$varname).
>>>
>>> df$newVarname<- df["varname"] inserts a new component
>>> into df, the component being a one-column data.frame,
>>> not the column in that data.frame.
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org
>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Albert-Jan Roskam
>>>> Sent: Friday, June 17, 2011 1:49 PM
>>>> To: R Mailing List
>>>> Subject: [R] is this a bug?
>>>>
>>>> Hello,
>>>>
>>>> Is the following a bug? I always thought that df$varname<-
>>>> does the same as
>>>> df["varname"]<-
>>>>
>>>>> df<- data.frame(weight=round(runif(10, 10, 100)),
>>>> sex=round(runif(100, 0,
>>>> 1)))
>>>>> df$pct<- df["weight"] / ave(df["weight"], df["sex"], FUN=sum)*100
>>>>> names(df)
>>>> [1] "weight" "sex" "pct" ### ----------> ok
>>>>> head(df)
> [[elided Yahoo spam]]
>>>> 1 86 0 2.4002233
>>>> 2 19 1 0.5643006
>>>> 3 32 0 0.8931063
>>>> 4 87 0 2.4281328
>>>> 5 45 0 1.2559308
>>>> 6 95 0 2.6514094
>>>>> rm(df)
>>>>> df<- data.frame(weight=round(runif(10, 10, 100)),
>>>> sex=round(runif(100, 0,
>>>> 1)))
>>>>> df["pct"]<- df["weight"] / ave(df["weight"], df["sex"],
>>>> FUN=sum)*100 ###
>>>>> -----> this does work
>>>>> names(df)
>>>> [1] "weight" "sex" "pct"
>>>>> head(df)
>>>> weight sex pct
>>>> 1 15 0 0.5246590
>>>> 2 43 0 1.5040224
>>>> 3 17 1 0.9284544
>>>> 4 44 1 2.4030584
>>>> 5 76 1 4.1507373
>>>> 6 59 0 2.0636586
>>>>> do.call(c, R.Version())
>>>> platform arch
>>>> "i686-pc-linux-gnu" "i686"
>>>> os system
>>>> "linux-gnu" "i686, linux-gnu"
>>>> status major
>>>> "" "2"
>>>> minor year
>>>> "11.1" "2010"
>>>> month day
>>>> "05" "31"
>>>> svn rev language
>>>> "52157" "R"
>>>> version.string
>>>> "R version 2.11.1 (2010-05-31)"
>>>>> # Thanks!
>>>>
>>>> Cheers!!
>>>> Albert-Jan
>>>>
>>>>
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> All right, but apart from the sanitation, the medicine,
>>>> education, wine, public
>>>> order, irrigation, roads, a fresh water system, and public
>>>> health, what have the
>>>> Romans ever done for us?
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding)<ted.harding at wlandres.net>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 17-Jun-11 Time: 22:24:41
>> ------------------------------ XFMail ------------------------------
>>
>
>
> --
> Brian S. Diggs, PhD
> Senior Research Associate, Department of Surgery
> Oregon Health & Science University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list