[R] Infelicity in print output with matrix indexing of `[.data.frame`

peter dalgaard pdalgd at gmail.com
Sun Dec 18 22:50:12 CET 2016


> On 18 Dec 2016, at 19:51 , Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
> 
> Ah, "why"... perhaps because the speed reduction involved in successive indexing operations on data frames was considered unacceptable to the programmer? (Also the code would essentially have to check for type conversion of the result vector as every row of the index matrix was retrieved.) Perhaps for backward compatibility?

More likely, to avoid having the type of the result depend on the value of the index. Also, sub-index consistency: does one really want D[M][1:2] to be of a different type than D[M[1:2,]].

-pd 

> 
> You could code your own version that behaved the way you like, but I think the usual expectation is that indexing should be faster than an R for loop, so hiding such behavior behind [.data.frame seems a bit deceptive to me. 
> 
> It seems much more straightforward to me to explicitly convert that portion of the data frame that you intend to do matrix indexing with into a matrix of known type for the purposes of this task, rather than expecting [.data.frame to figure out that you don't plan to retrieve values from the non-numeric columns of the data frame. (Sometimes the fact that things are hard is a hint that you should re-think your solution.)
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On December 18, 2016 10:00:45 AM PST, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>>> On Dec 17, 2016, at 3:15 PM, Jeff Newmiller
>> <jdnewmil at dcn.davis.ca.us> wrote:
>>> 
>>> No, cannot agree. The result of using an n by 2 matrix to index into
>> a rectangular object is a vector. A vector can only have one storage
>> mode for all elements. Some type coercion is necessary to accommodate
>> this.
>> 
>> I have no argument with the premise that an atomic vector must be of a
>> single mode.  But the exact same values were established with a numeric
>> vector into those positions indexed by the 2-column matrix. Why does
>> extraction need to coerce the entire dataframe to matrix when none of
>> the extracted values are character? I suppose my request is that the
>> very simple line in `[.data.frame`
>> 
>> 
>>   if (is.matrix(i)) 
>>           return(as.matrix(x)[i])
>> 
>> If it were replaced by code that would only extract from the values
>> needed and then use a shifted version of the selection matrix, you
>> could get values that were not coerced by being innocent bystanders of
>> a dataframe colum that was not relevant.
>> 
>> as.matrix( x[ min( i[ , 1]):max( i[ , 1]), min( i[ ,2 ]):max(i[ , 2])
>> ])[
>>             cbind( i[,1]-min( i[ , 1]) +1, i[,2]- min( i[ ,2 ]) +1) ]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list