[R] Odd behaviour of mean() with a numeric column in a tibble

Chris Evans chrishold at psyctc.org
Tue Dec 6 23:10:15 CET 2016


{{SIGH}} 

You are absolutely right. 

I wonder if I am losing some cognitive capacities that are needed to be part of the evolving R community. It seems to me that if a tibble is designed to be an enhanced replacement for a dataframe then it shouldn't quite so radically change things. 

I notice that the documentation on tibble says "[ Never simplifies (drops), so always returns data.frame" 
That is much less explicit than I would have liked and actually doesn't seem to be true. In fact, as you rightly say, it generally, but not quite always, returns a tibble. In fact it can be fooled into a vector of length 1. 

> tmpTibble[[1,]] 
Error in `[[.data.frame`(tmpTibble, 1, ) : 
argument "..2" is missing, with no default 

> tmpTibble[1] 
# A tibble: 26 × 1 
ID 
<chr> 
1 a 
2 b 
3 c 
4 d 
5 e 
6 f 
7 g 
8 h 
9 i 
10 j 
# ... with 16 more rows 
> tmpTibble[,1] 
# A tibble: 26 × 1 
ID 
<chr> 
1 a 
2 b 
3 c 
4 d 
5 e 
6 f 
7 g 
8 h 
9 i 
10 j 
# ... with 16 more rows 
> tmpTibble[1,] 
Error in `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : 
replacement element 3 is a matrix/data frame of 26 rows, need 1 
In addition: Warning messages: 
1: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : 
replacement element 1 has 26 rows to replace 1 rows 
2: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : 
replacement element 2 has 26 rows to replace 1 rows 
> tmpTibble[1,1:26] 
Error: Invalid column indexes: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 
> tmpTibble[[1,2]] 
[1] 1 
> str(tmpTibble[[1,2]]) 
int 1 
> str(tmpTibble[[1:2,2]]) 
Error in col[[i, exact = exact]] : 
attempt to select more than one element in vectorIndex 
> 
> tmpTibble[[1,1:2]] 
[1] "b" 
> 

So [[a,b]] works if a and b are legal with the dimensions of the tibble and if a is of length 1 but returns NOT a tibble but a vector of length 1 (I think), I can see that's logical but not what it says in the documentation. 

[[a]] and [[,a]] return the same result, that seems excessively tolerant to me. 

[[a,b:c]] actually returns [[a,c]] and again as a single value, NOT a tibble. 

And row subsetting/indexing has gone. 

Why create replacement for a dataframe that has no row indexing and so radically redefines column indexing, in fact redefines the whole of indexing and subsetting? 

OK. I will go to sleep now and hope to feel less dumb(ed) when I wake. Perhaps Prof. Wickham or someone can spell out a bit less tersely, and I think incompletely, than the tibble documentation does, why all this is good. 

Thanks anyway Ista, you certainly hit the issue! 

Very best all, 

Chris 

> From: "Ista Zahn" <istazahn at gmail.com>
> To: "Chris Evans" <chrishold at psyctc.org>
> Cc: "r-helpr-project.org" <r-help at r-project.org>
> Sent: Tuesday, 6 December, 2016 21:40:41
> Subject: Re: [R] Odd behaviour of mean() with a numeric column in a tibble

> Not at a computer to check right now, but I believe single bracket indexing a
> tibble always returns a tibble. To extract a vector use [[

> On Dec 6, 2016 4:28 PM, "Chris Evans" < chrishold at psyctc.org > wrote:

>> I hope I am obeying the list rules here. I am using a raw R IDE for this and
> > running 3.3.2 (2016-10-31) on x86_64-w64-mingw32/x64 (64-bit)

> > Here is a reproducible example. Code only first

> > require(tibble)
> > tmpTibble <- tibble(ID=letters,num=1:26)
> > min(tmpTibble[,2]) # fine
> > max(tmpTibble[,2]) # fine
> > median(tmpTibble[,2]) # not fine
> > mean(tmpTibble[,2]) # not fine

> I think you want

> mean(tmpTibble[[2]]

> > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))}
> > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be necessary?!
> > newMedianFun <- function(x) {median(as.numeric(unlist(x)))}
> > newMedianFun(tmpTibble[,2]) # ditto
> > str(tmpTibble[,2])

> > ### then I tried this to make sure it wasn't about having fed in integers

> > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10)
> > tmpTibble2
> > mean(tmpTibble2[,3]) # not fine, not about integers!


>> ### before I just created tmpTibble2 I found myself trying to add a column to
> > tmpTibble
> > tmpTibble$newNum <- tmpTibble[,2]/10 # NO!
> > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO!
> > ### and oddly enough ...
> > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!

> > Now here it is with the output:

> > > require(tibble)
> > Loading required package: tibble
> > > tmpTibble <- tibble(ID=letters,num=1:26)
> > > min(tmpTibble[,2]) # fine
> > [1] 1
> > > max(tmpTibble[,2]) # fine
> > [1] 26
> > > median(tmpTibble[,2]) # not fine
> > Error in median.default(tmpTibble[, 2]) : need numeric data
> > > mean(tmpTibble[,2]) # not fine
> > [1] NA
> > Warning message:
> > In mean.default(tmpTibble[, 2]) :
> > argument is not numeric or logical: returning NA
> > > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))}
> > > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be necessary?!
> > [1] 13.5
> > > newMedianFun <- function(x) {median(as.numeric(unlist(x)))}
> > > newMedianFun(tmpTibble[,2]) # ditto
> > [1] 13.5
> > > str(tmpTibble[,2])
> > Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 26 obs. of 1 variable:
> > $ num: int 1 2 3 4 5 6 7 8 9 10 ...

> > > ### then I tried this to make sure it wasn't about having fed in integers

> > > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10)
> > > tmpTibble2
> > # A tibble: 26 × 3
> > ID num num2
> > <chr> <int> <dbl>
> > 1 a 1 0.1
> > 2 b 2 0.2
> > 3 c 3 0.3
> > 4 d 4 0.4
> > 5 e 5 0.5
> > 6 f 6 0.6
> > 7 g 7 0.7
> > 8 h 8 0.8
> > 9 i 9 0.9
> > 10 j 10 1.0
> > # ... with 16 more rows
> > > mean(tmpTibble2[,3]) # not fine, not about integers!
> > [1] NA
> > Warning message:
> > In mean.default(tmpTibble2[, 3]) :
> > argument is not numeric or logical: returning NA


>> > ### before I just created tmpTibble2 I found myself trying to add a column to
> > > tmpTibble
> > > tmpTibble$newNum <- tmpTibble[,2]/10 # NO!
> > > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO!
> > > ### and oddly enough ...
> > > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!
> > Error: Each variable must be a 1d atomic vector or list.
> > Problem variables: 'newNum'



>> I discovered this when I hit odd behaviour after using read_spss() from the
>> haven package for the first time as it seemed to be offering a step forward
>> over good old read.spss() from the excellent foreign package. I am reporting it
>> here not directly to Prof. Wickham as the issues seem rather general though I'm
>> guessing that it needs to be fixed with a fix to tibble. Or perhaps I've
> > completely missed something.

> > TIA,

> > Chris

> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list