[R] Bug in by() function which works for some FUN argument and does not work for others

Akhilesh Singh akhileshsingh.igkv at gmail.com
Fri Apr 15 10:16:54 CEST 2016


Dear All,

Thanks for your help. However, I would like to draw your attention to the
following:

Actually, I was replicating the Example 2.3, using the dataset
"brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55,
of a famous book "R by Example" written by "Jim Albert and Maria Rizzo"
published in Springers (2012) in a Use R! Series. The output of the by()
function printed in the book is being reproduced below for information to
all:

> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE)
brain$Gender: Female
FSIQ VIQ PIQ Weight Height MRI_Count
111.900 109.450 110.450 137.200 65.765 862654.600
------------------------------------------------------------
brain$Gender: Male
FSIQ  VIQ    PIQ       Weight    Height   MRI_Count
115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000


I do not know how could the writers of the book have produced the above
results by by() function. But, when I could not reproduce these results,
then I thought that probably, this could possibly be due to some missing
values NA's in Weight and Height variables. Then I tried the above code for
the "mtcars" dataset for INDICES=mtcars$am. When I found the same results
here too, then I reported the case in "r-help at R-project.org".

With best regards,

Dr. A.K. Singh
Head, Department of Agril. Statistics
Indira Gandhi Krishi Vishwavidyalaya, Raipur
Chhattisgarh, India, PIN-492012
Mobile: +919752620740
Email: akhileshsingh.igkv at gmail.com

On Fri, Apr 15, 2016 at 3:06 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:

> I think you are not using the best function for what your intentions are.
> Try:
>
> > by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans)
> : 0
>         mpg         cyl        disp          hp        drat          wt
>      qsec          vs
>  17.1473684   6.9473684 290.3789474 160.2631579   3.2863158   3.7688947
>  18.1831579   0.3684211
>          am        gear        carb
>   0.0000000   3.2105263   2.7368421
>
> ---------------------------------------------------------------------------
> : 1
>         mpg         cyl        disp          hp        drat          wt
>      qsec          vs
>  24.3923077   5.0769231 143.5307692 126.8461538   4.0500000   2.4110000
>  17.3600000   0.5384615
>          am        gear        carb
>   1.0000000   4.3846154   2.9230769
>
> See the difference between colMeans() and mean() in their respective help
> files.
> Hth,
> Adrian
>
> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh <
> akhileshsingh.igkv at gmail.com> wrote:
>
>> Dear Sirs,
>>
>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur,
>> Chhattisgarh, India.
>>
>> While taking classes, I found the *by() *function producing following
>> error
>>
>> when I use FUN=mean or median and some other functions, however,
>> FUN=summary works.
>>
>> Given below is the output of the example I used on a built-in dataset
>> "mtcars", along with error message reproduced herewith:
>>
>> > by(data=mtcars, INDICES=list(mtcars$am), FUN=mean)
>> : 0
>> [1] NA
>> ------------------------------------------------------------
>> : 1
>> [1] NA
>> Warning messages:
>> 1: In mean.default(data[x, , drop = FALSE], ...) :
>>   argument is not numeric or logical: returning NA
>> 2: In mean.default(data[x, , drop = FALSE], ...) :
>>   argument is not numeric or logical: returning NA
>>
>> However, the same by() function works for FUN=summary, given below is the
>> output:
>>
>> > by(data=mtcars, INDICES=list(mtcars$am), FUN=summary)
>> : 0
>>       mpg             cyl             disp             hp
>>  Min.   :10.40   Min.   :4.000   Min.   :120.1   Min.   : 62.0
>>  1st Qu.:14.95   1st Qu.:6.000   1st Qu.:196.3   1st Qu.:116.5
>>  Median :17.30   Median :8.000   Median :275.8   Median :175.0
>>  Mean   :17.15   Mean   :6.947   Mean   :290.4   Mean   :160.3
>>  3rd Qu.:19.20   3rd Qu.:8.000   3rd Qu.:360.0   3rd Qu.:192.5
>>  Max.   :24.40   Max.   :8.000   Max.   :472.0   Max.   :245.0
>>       drat             wt             qsec             vs               am
>>
>>  Min.   :2.760   Min.   :2.465   Min.   :15.41   Min.   :0.0000   Min.
>>  :0
>>
>>  1st Qu.:3.070   1st Qu.:3.438   1st Qu.:17.18   1st Qu.:0.0000   1st
>> Qu.:0
>>
>>  Median :3.150   Median :3.520   Median :17.82   Median :0.0000   Median
>> :0
>>
>>  Mean   :3.286   Mean   :3.769   Mean   :18.18   Mean   :0.3684   Mean
>>  :0
>>
>>  3rd Qu.:3.695   3rd Qu.:3.842   3rd Qu.:19.17   3rd Qu.:1.0000   3rd
>> Qu.:0
>>
>>  Max.   :3.920   Max.   :5.424   Max.   :22.90   Max.   :1.0000   Max.
>>  :0
>>
>>       gear            carb
>>  Min.   :3.000   Min.   :1.000
>>  1st Qu.:3.000   1st Qu.:2.000
>>  Median :3.000   Median :3.000
>>  Mean   :3.211   Mean   :2.737
>>  3rd Qu.:3.000   3rd Qu.:4.000
>>  Max.   :4.000   Max.   :4.000
>> ------------------------------------------------------------
>> : 1
>>       mpg             cyl             disp             hp             drat
>>
>>  Min.   :15.00   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.
>> :3.54
>>  1st Qu.:21.00   1st Qu.:4.000   1st Qu.: 79.0   1st Qu.: 66.0   1st
>> Qu.:3.85
>>  Median :22.80   Median :4.000   Median :120.3   Median :109.0   Median
>> :4.08
>>  Mean   :24.39   Mean   :5.077   Mean   :143.5   Mean   :126.8   Mean
>> :4.05
>>  3rd Qu.:30.40   3rd Qu.:6.000   3rd Qu.:160.0   3rd Qu.:113.0   3rd
>> Qu.:4.22
>>  Max.   :33.90   Max.   :8.000   Max.   :351.0   Max.   :335.0   Max.
>> :4.93
>>        wt             qsec             vs               am         gear
>>
>>  Min.   :1.513   Min.   :14.50   Min.   :0.0000   Min.   :1   Min.
>>  :4.000
>>
>>  1st Qu.:1.935   1st Qu.:16.46   1st Qu.:0.0000   1st Qu.:1   1st
>> Qu.:4.000
>>
>>  Median :2.320   Median :17.02   Median :1.0000   Median :1   Median
>> :4.000
>>
>>  Mean   :2.411   Mean   :17.36   Mean   :0.5385   Mean   :1   Mean
>>  :4.385
>>
>>  3rd Qu.:2.780   3rd Qu.:18.61   3rd Qu.:1.0000   3rd Qu.:1   3rd
>> Qu.:5.000
>>
>>  Max.   :3.570   Max.   :19.90   Max.   :1.0000   Max.   :1   Max.
>>  :5.000
>>
>>       carb
>>  Min.   :1.000
>>  1st Qu.:1.000
>>  Median :2.000
>>  Mean   :2.923
>>  3rd Qu.:4.000
>>  Max.   :8.000
>> >
>>
>> I am using the latest version of *R-3.2.4 on Windows*, however, this error
>> is being generated in the previous version too,
>>
>> Hope this reporting will get serious attention in debugging.
>>
>> With best regards,
>>
>> Dr. A.K. Singh
>> Head, Department of Agril. Statistics
>> Indira Gandhi Krishi Vishwavidyalaya, Raipur
>> Chhattisgarh, India, PIN-492012
>> Mobile: +919752620740
>> Email: akhileshsingh.igkv at gmail.com
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list