[R] Bug in by() function which works for some FUN argument and does not work for others

Akhilesh Singh akhileshsingh.igkv at gmail.com
Sun Apr 17 17:11:26 CEST 2016


Dear All,

Yes, I certainly now agree with the suggestion of Adrian Dusa for using
colMeans in place of mean in the situation that I had reported to r-help.
And I am sorry that I did not personally extend my thanks to him. I really
wish to thank him for his suggestion, and I do this now.

However, I wished for future a way to apply a more complex function than
mean, say, e.g. a function for skewness or kurtosis and the likes in the
by() function. The function colMeans would be applicable for mean only.
That is why, I later came to above solution.

Yet, during all these deliberations, I wish to mention the suggestion given
by Dr. Jim Lemon, who suggested to use following code mixing by() into
sapply() wonderfully taking advantage of positional interpretations of
arguments in any R-code, that actually met my objective nicely for my
future project of using more complex functions in lieu of mean:

sapply(brain[,-1],by,brain$Gender,mean,na.rm=TRUE)
       FSIQ    VIQ    PIQ   Weight   Height MRI_Count
Female 111.9 109.45 110.45 137.2000 65.76500  862654.6
Male   115.0 115.25 111.60 166.4444 71.43158  954855.4

Secondly, I also wish to express my sorry to have mentioned "bug" for the
by() function, instead of thinking that I could be my mistake, whereas I
should have plainly sought help from r-help instead of calling it a "bug".
Had this hurt anybody's feeling, I express my regret and offer my apologies
to all of them for calling this name.

With best regards,

Dr. A.K. Singh
Head, Department of Agril. Statistics
Indira Gandhi Krishi Vishwavidyalaya, Raipur
Chhattisgarh, India, PIN-492012
Mobile: +919752620740
Email: akhileshsingh.igkv at gmail.com



On Sun, Apr 17, 2016 at 7:52 AM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On Apr 16, 2016, at 2:03 AM, Akhilesh Singh <
> akhileshsingh.igkv at gmail.com> wrote:
> >
> > Dear All,
> >
> > I have got your core message, that it is my responsibility to determine
> whether any particular function in my version of R satisfies the language
> requirements at the time of your use. Jim Albert and Maria Rizzo must have
> used their code, which was permitted in the R-code of their time (2012).
> >
> > Therefore, I have now modified my R-code, as per R-3..2.4 version,
> according to my requirement as follows, which is working for my 'brain'
> data set, whose output is reproduced below for your information please:
> >
> > > by(brain[,-1], INDICES=list(Gender=brain$Gender), FUN=function(x,
> na.rm=FALSE) sapply(x, mean, na.rm=na.rm), na.rm=TRUE)
> > Gender: Female
> >       FSIQ        VIQ        PIQ     Weight     Height  MRI_Count
> >    111.900    109.450    110.450    137.200     65.765 862654.600
> >
> --------------------------------------------------------------------------------------------------
> > Gender: Male
> >         FSIQ          VIQ          PIQ       Weight       Height
> MRI_Count
> >    115.00000    115.25000    111.60000    166.44444     71.43158
> 954855.40000
>
> Yes. that is certainly a workable alternative, although I thought the
> question of "how to to it" had been effectively answered with the
> suggestion from Adrian Dusa to use colMeans. It, too, has an `na.rm=TRUE`
> option
>
> I was only responding to your plaintive complaint that the current version
> of R had a "bug" because it was not behaving as promised by an introductory
> text with a three year-old publishing date.
>
> --
> David.
>
>
> >
> > With best regards,
> >
> > Dr. A.K. Singh
> > Head, Department of Agril. Statistics
> > Indira Gandhi Krishi Vishwavidyalaya, Raipur
> > Chhattisgarh, India, PIN-492012
> > Mobile: +919752620740
> > Email: akhileshsingh.igkv at gmail.com
> >
> > On Fri, Apr 15, 2016 at 2:24 PM, David Winsemius <dwinsemius at comcast.net>
> wrote:
> >
> > > On Apr 15, 2016, at 1:16 AM, Akhilesh Singh <
> akhileshsingh.igkv at gmail.com> wrote:
> > >
> > > Dear All,
> > >
> > > Thanks for your help. However, I would like to draw your attention to
> the
> > > following:
> > >
> > > Actually, I was replicating the Example 2.3, using the dataset
> > > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page
> 55,
> > > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo"
> > > published in Springers (2012) in a Use R! Series. The output of the
> by()
> > > function printed in the book is being reproduced below for information
> to
> > > all:
> > >
> > >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE)
> > > brain$Gender: Female
> > > FSIQ VIQ PIQ Weight Height MRI_Count
> > > 111.900 109.450 110.450 137.200 65.765 862654.600
> > > ------------------------------------------------------------
> > > brain$Gender: Male
> > > FSIQ  VIQ    PIQ       Weight    Height   MRI_Count
> > > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000
> > >
> > >
> > > I do not know how could the writers of the book have produced the above
> > > results by by() function.
> >
> >
> > There was in the not-so-distant past a function named `mean.data.frame`
> which would have "worked" in that instance. That function was removed. I
> thought you could  find the exact date of that action by searching the NEWS
> but failed. Reviewing the citations of `mean.data.frame` in the r-help
> archives I see that users were being warned that its use was deprecated in
> mid 2012.  It's very possible that the authors of a book in 2012 were using
> an earlier version of R that had that facility available to them before it
> was deprecated. With a more than current version of R 3.3.0 and a modest
> number of loaded packages I see this:
> >
> > > methods(mean)
> >  [1] mean,ANY-method          mean,Matrix-method       mean,Raster-method
> >  [4] mean,sparseMatrix-method mean,sparseVector-method mean.Date
> >  [7] mean.default             mean.difftime            mean.POSIXct
> > [10] mean.POSIXlt             mean.yearmon*            mean.yearqtr*
> > [13] mean.zoo*
> >
> > It is your responsibility to determine whether any particular function
> in your version of R satisfies the language requirements at the time of
> your use. Jim Albert and Maria Rizzo do not set the standards for what is
> an evolving piece of software.
> >
> > --
> > David.
> >
> >
> > > But, when I could not reproduce these results,
> > > then I thought that probably, this could possibly be due to some
> missing
> > > values NA's in Weight and Height variables. Then I tried the above
> code for
> > > the "mtcars" dataset for INDICES=mtcars$am. When I found the same
> results
> > > here too, then I reported the case in "r-help at R-project.org".
> > >
> > > With best regards,
> > >
> > > Dr. A.K. Singh
> > > Head, Department of Agril. Statistics
> > > Indira Gandhi Krishi Vishwavidyalaya, Raipur
> > > Chhattisgarh, India, PIN-492012
> > > Mobile: +919752620740
> > > Email: akhileshsingh.igkv at gmail.com
> > >
> > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Dușa <dusa.adrian at unibuc.ro>
> wrote:
> > >
> > >> I think you are not using the best function for what your intentions
> are.
> > >> Try:
> > >>
> > >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans)
> > >> : 0
> > >>        mpg         cyl        disp          hp        drat          wt
> > >>     qsec          vs
> > >> 17.1473684   6.9473684 290.3789474 160.2631579   3.2863158   3.7688947
> > >> 18.1831579   0.3684211
> > >>         am        gear        carb
> > >>  0.0000000   3.2105263   2.7368421
> > >>
> > >>
> ---------------------------------------------------------------------------
> > >> : 1
> > >>        mpg         cyl        disp          hp        drat          wt
> > >>     qsec          vs
> > >> 24.3923077   5.0769231 143.5307692 126.8461538   4.0500000   2.4110000
> > >> 17.3600000   0.5384615
> > >>         am        gear        carb
> > >>  1.0000000   4.3846154   2.9230769
> > >>
> > >> See the difference between colMeans() and mean() in their respective
> help
> > >> files.
> > >> Hth,
> > >> Adrian
> > >>
> > >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh <
> > >> akhileshsingh.igkv at gmail.com> wrote:
> > >>
> > >>> Dear Sirs,
> > >>>
> > >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur,
> > >>> Chhattisgarh, India.
> > >>>
> > >>> While taking classes, I found the *by() *function producing following
> > >>> error
> > >>>
> > >>> when I use FUN=mean or median and some other functions, however,
> > >>> FUN=summary works.
> > >>>
> > >>> Given below is the output of the example I used on a built-in dataset
> > >>> "mtcars", along with error message reproduced herewith:
> > >>>
> > >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean)
> > >>> : 0
> > >>> [1] NA
> > >>> ------------------------------------------------------------
> > >>> : 1
> > >>> [1] NA
> > >>> Warning messages:
> > >>> 1: In mean.default(data[x, , drop = FALSE], ...) :
> > >>>  argument is not numeric or logical: returning NA
> > >>> 2: In mean.default(data[x, , drop = FALSE], ...) :
> > >>>  argument is not numeric or logical: returning NA
> > >>>
> > >>> However, the same by() function works for FUN=summary, given below
> is the
> > >>> output:
> > >>>
> > >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary)
> > >>> : 0
> > >>>      mpg             cyl             disp             hp
> > >>> Min.   :10.40   Min.   :4.000   Min.   :120.1   Min.   : 62.0
> > >>> 1st Qu.:14.95   1st Qu.:6.000   1st Qu.:196.3   1st Qu.:116.5
> > >>> Median :17.30   Median :8.000   Median :275.8   Median :175.0
> > >>> Mean   :17.15   Mean   :6.947   Mean   :290.4   Mean   :160.3
> > >>> 3rd Qu.:19.20   3rd Qu.:8.000   3rd Qu.:360.0   3rd Qu.:192.5
> > >>> Max.   :24.40   Max.   :8.000   Max.   :472.0   Max.   :245.0
> > >>>      drat             wt             qsec             vs
>    am
> > >>>
> > >>> Min.   :2.760   Min.   :2.465   Min.   :15.41   Min.   :0.0000   Min.
> > >>> :0
> > >>>
> > >>> 1st Qu.:3.070   1st Qu.:3.438   1st Qu.:17.18   1st Qu.:0.0000   1st
> > >>> Qu.:0
> > >>>
> > >>> Median :3.150   Median :3.520   Median :17.82   Median :0.0000
>  Median
> > >>> :0
> > >>>
> > >>> Mean   :3.286   Mean   :3.769   Mean   :18.18   Mean   :0.3684   Mean
> > >>> :0
> > >>>
> > >>> 3rd Qu.:3.695   3rd Qu.:3.842   3rd Qu.:19.17   3rd Qu.:1.0000   3rd
> > >>> Qu.:0
> > >>>
> > >>> Max.   :3.920   Max.   :5.424   Max.   :22.90   Max.   :1.0000   Max.
> > >>> :0
> > >>>
> > >>>      gear            carb
> > >>> Min.   :3.000   Min.   :1.000
> > >>> 1st Qu.:3.000   1st Qu.:2.000
> > >>> Median :3.000   Median :3.000
> > >>> Mean   :3.211   Mean   :2.737
> > >>> 3rd Qu.:3.000   3rd Qu.:4.000
> > >>> Max.   :4.000   Max.   :4.000
> > >>> ------------------------------------------------------------
> > >>> : 1
> > >>>      mpg             cyl             disp             hp
>  drat
> > >>>
> > >>> Min.   :15.00   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.
> > >>> :3.54
> > >>> 1st Qu.:21.00   1st Qu.:4.000   1st Qu.: 79.0   1st Qu.: 66.0   1st
> > >>> Qu.:3.85
> > >>> Median :22.80   Median :4.000   Median :120.3   Median :109.0
>  Median
> > >>> :4.08
> > >>> Mean   :24.39   Mean   :5.077   Mean   :143.5   Mean   :126.8   Mean
> > >>> :4.05
> > >>> 3rd Qu.:30.40   3rd Qu.:6.000   3rd Qu.:160.0   3rd Qu.:113.0   3rd
> > >>> Qu.:4.22
> > >>> Max.   :33.90   Max.   :8.000   Max.   :351.0   Max.   :335.0   Max.
> > >>> :4.93
> > >>>       wt             qsec             vs               am
>  gear
> > >>>
> > >>> Min.   :1.513   Min.   :14.50   Min.   :0.0000   Min.   :1   Min.
> > >>> :4.000
> > >>>
> > >>> 1st Qu.:1.935   1st Qu.:16.46   1st Qu.:0.0000   1st Qu.:1   1st
> > >>> Qu.:4.000
> > >>>
> > >>> Median :2.320   Median :17.02   Median :1.0000   Median :1   Median
> > >>> :4.000
> > >>>
> > >>> Mean   :2.411   Mean   :17.36   Mean   :0.5385   Mean   :1   Mean
> > >>> :4.385
> > >>>
> > >>> 3rd Qu.:2.780   3rd Qu.:18.61   3rd Qu.:1.0000   3rd Qu.:1   3rd
> > >>> Qu.:5.000
> > >>>
> > >>> Max.   :3.570   Max.   :19.90   Max.   :1.0000   Max.   :1   Max.
> > >>> :5.000
> > >>>
> > >>>      carb
> > >>> Min.   :1.000
> > >>> 1st Qu.:1.000
> > >>> Median :2.000
> > >>> Mean   :2.923
> > >>> 3rd Qu.:4.000
> > >>> Max.   :8.000
> > >>>>
> > >>>
> > >>> I am using the latest version of *R-3.2.4 on Windows*, however, this
> error
> > >>> is being generated in the previous version too,
> > >>>
> > >>> Hope this reporting will get serious attention in debugging.
> > >>>
> > >>> With best regards,
> > >>>
> > >>> Dr. A.K. Singh
> > >>> Head, Department of Agril. Statistics
> > >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur
> > >>> Chhattisgarh, India, PIN-492012
> > >>> Mobile: +919752620740
> > >>> Email: akhileshsingh.igkv at gmail.com
> > >>>
> > >>>        [[alternative HTML version deleted]]
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Adrian Dusa
> > >> University of Bucharest
> > >> Romanian Social Data Archive
> > >> Soseaua Panduri nr.90
> > >> 050663 Bucharest sector 5
> > >> Romania
> > >>
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list