[R] Odd behaviour of mean() with a numeric column in a tibble

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Dec 7 00:23:28 CET 2016


You really need sleep. Then you need to read

?`[[`

and in particular read about the second argument to the `[[` function, since you don't seem to understand what it is for. Maybe reread the Introduction to R document that comes with R.

The simplest solution is to treat `[[` as supporting one index and `[` as supporting either one or two. 

As for expecting any form of row indexing of data frames or tibbles to return a vector, that is hopeless because each column can have a different type.  dta[ 1, ] returns exactly what it has to return to avoid losing fidelity. If you really need row indexing to return a vector you should be using a matrix. 
-- 
Sent from my phone. Please excuse my brevity.

On December 6, 2016 2:10:15 PM PST, Chris Evans <chrishold at psyctc.org> wrote:
>{{SIGH}} 
>
>You are absolutely right. 
>
>I wonder if I am losing some cognitive capacities that are needed to be
>part of the evolving R community. It seems to me that if a tibble is
>designed to be an enhanced replacement for a dataframe then it
>shouldn't quite so radically change things. 
>
>I notice that the documentation on tibble says "[ Never simplifies
>(drops), so always returns data.frame" 
>That is much less explicit than I would have liked and actually doesn't
>seem to be true. In fact, as you rightly say, it generally, but not
>quite always, returns a tibble. In fact it can be fooled into a vector
>of length 1. 
>
>> tmpTibble[[1,]] 
>Error in `[[.data.frame`(tmpTibble, 1, ) : 
>argument "..2" is missing, with no default 
>
>> tmpTibble[1] 
># A tibble: 26 × 1 
>ID 
><chr> 
>1 a 
>2 b 
>3 c 
>4 d 
>5 e 
>6 f 
>7 g 
>8 h 
>9 i 
>10 j 
># ... with 16 more rows 
>> tmpTibble[,1] 
># A tibble: 26 × 1 
>ID 
><chr> 
>1 a 
>2 b 
>3 c 
>4 d 
>5 e 
>6 f 
>7 g 
>8 h 
>9 i 
>10 j 
># ... with 16 more rows 
>> tmpTibble[1,] 
>Error in `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a",
>: 
>replacement element 3 is a matrix/data frame of 26 rows, need 1 
>In addition: Warning messages: 
>1: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : 
>replacement element 1 has 26 rows to replace 1 rows 
>2: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : 
>replacement element 2 has 26 rows to replace 1 rows 
>> tmpTibble[1,1:26] 
>Error: Invalid column indexes: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
>15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 
>> tmpTibble[[1,2]] 
>[1] 1 
>> str(tmpTibble[[1,2]]) 
>int 1 
>> str(tmpTibble[[1:2,2]]) 
>Error in col[[i, exact = exact]] : 
>attempt to select more than one element in vectorIndex 
>> 
>> tmpTibble[[1,1:2]] 
>[1] "b" 
>> 
>
>So [[a,b]] works if a and b are legal with the dimensions of the tibble
>and if a is of length 1 but returns NOT a tibble but a vector of length
>1 (I think), I can see that's logical but not what it says in the
>documentation. 
>
>[[a]] and [[,a]] return the same result, that seems excessively
>tolerant to me. 
>
>[[a,b:c]] actually returns [[a,c]] and again as a single value, NOT a
>tibble. 
>
>And row subsetting/indexing has gone. 
>
>Why create replacement for a dataframe that has no row indexing and so
>radically redefines column indexing, in fact redefines the whole of
>indexing and subsetting? 
>
>OK. I will go to sleep now and hope to feel less dumb(ed) when I wake.
>Perhaps Prof. Wickham or someone can spell out a bit less tersely, and
>I think incompletely, than the tibble documentation does, why all this
>is good. 
>
>Thanks anyway Ista, you certainly hit the issue! 
>
>Very best all, 
>
>Chris 
>
>> From: "Ista Zahn" <istazahn at gmail.com>
>> To: "Chris Evans" <chrishold at psyctc.org>
>> Cc: "r-helpr-project.org" <r-help at r-project.org>
>> Sent: Tuesday, 6 December, 2016 21:40:41
>> Subject: Re: [R] Odd behaviour of mean() with a numeric column in a
>tibble
>
>> Not at a computer to check right now, but I believe single bracket
>indexing a
>> tibble always returns a tibble. To extract a vector use [[
>
>> On Dec 6, 2016 4:28 PM, "Chris Evans" < chrishold at psyctc.org > wrote:
>
>>> I hope I am obeying the list rules here. I am using a raw R IDE for
>this and
>> > running 3.3.2 (2016-10-31) on x86_64-w64-mingw32/x64 (64-bit)
>
>> > Here is a reproducible example. Code only first
>
>> > require(tibble)
>> > tmpTibble <- tibble(ID=letters,num=1:26)
>> > min(tmpTibble[,2]) # fine
>> > max(tmpTibble[,2]) # fine
>> > median(tmpTibble[,2]) # not fine
>> > mean(tmpTibble[,2]) # not fine
>
>> I think you want
>
>> mean(tmpTibble[[2]]
>
>> > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))}
>> > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be
>necessary?!
>> > newMedianFun <- function(x) {median(as.numeric(unlist(x)))}
>> > newMedianFun(tmpTibble[,2]) # ditto
>> > str(tmpTibble[,2])
>
>> > ### then I tried this to make sure it wasn't about having fed in
>integers
>
>> > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10)
>> > tmpTibble2
>> > mean(tmpTibble2[,3]) # not fine, not about integers!
>
>
>>> ### before I just created tmpTibble2 I found myself trying to add a
>column to
>> > tmpTibble
>> > tmpTibble$newNum <- tmpTibble[,2]/10 # NO!
>> > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO!
>> > ### and oddly enough ...
>> > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!
>
>> > Now here it is with the output:
>
>> > > require(tibble)
>> > Loading required package: tibble
>> > > tmpTibble <- tibble(ID=letters,num=1:26)
>> > > min(tmpTibble[,2]) # fine
>> > [1] 1
>> > > max(tmpTibble[,2]) # fine
>> > [1] 26
>> > > median(tmpTibble[,2]) # not fine
>> > Error in median.default(tmpTibble[, 2]) : need numeric data
>> > > mean(tmpTibble[,2]) # not fine
>> > [1] NA
>> > Warning message:
>> > In mean.default(tmpTibble[, 2]) :
>> > argument is not numeric or logical: returning NA
>> > > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))}
>> > > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't
>be necessary?!
>> > [1] 13.5
>> > > newMedianFun <- function(x) {median(as.numeric(unlist(x)))}
>> > > newMedianFun(tmpTibble[,2]) # ditto
>> > [1] 13.5
>> > > str(tmpTibble[,2])
>> > Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 26 obs. of 1 variable:
>> > $ num: int 1 2 3 4 5 6 7 8 9 10 ...
>
>> > > ### then I tried this to make sure it wasn't about having fed in
>integers
>
>> > > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10)
>> > > tmpTibble2
>> > # A tibble: 26 × 3
>> > ID num num2
>> > <chr> <int> <dbl>
>> > 1 a 1 0.1
>> > 2 b 2 0.2
>> > 3 c 3 0.3
>> > 4 d 4 0.4
>> > 5 e 5 0.5
>> > 6 f 6 0.6
>> > 7 g 7 0.7
>> > 8 h 8 0.8
>> > 9 i 9 0.9
>> > 10 j 10 1.0
>> > # ... with 16 more rows
>> > > mean(tmpTibble2[,3]) # not fine, not about integers!
>> > [1] NA
>> > Warning message:
>> > In mean.default(tmpTibble2[, 3]) :
>> > argument is not numeric or logical: returning NA
>
>
>>> > ### before I just created tmpTibble2 I found myself trying to add
>a column to
>> > > tmpTibble
>> > > tmpTibble$newNum <- tmpTibble[,2]/10 # NO!
>> > > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO!
>> > > ### and oddly enough ...
>> > > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!
>> > Error: Each variable must be a 1d atomic vector or list.
>> > Problem variables: 'newNum'
>
>
>
>>> I discovered this when I hit odd behaviour after using read_spss()
>from the
>>> haven package for the first time as it seemed to be offering a step
>forward
>>> over good old read.spss() from the excellent foreign package. I am
>reporting it
>>> here not directly to Prof. Wickham as the issues seem rather general
>though I'm
>>> guessing that it needs to be fixed with a fix to tibble. Or perhaps
>I've
>> > completely missed something.
>
>> > TIA,
>
>> > Chris
>
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list