# [R] subset of matrix vs data frame

Denis White denis at mail.cor.epa.gov
Thu Jun 1 21:09:44 CEST 2000

```On Wed, 31 May 2000, Peter Dalgaard BSA wrote:

> Prof Brian D Ripley <ripley at stats.ox.ac.uk> writes:
>
> > On Wed, 31 May 2000, Denis White wrote:
> >
> > > In Splus this works,
> > >
> > > > a <- data.frame(matrix(round(runif(10),0),nrow=2,ncol=5))
> > > > a[a == 1] <- 2
> > >
> > > but in R only if a is matrix,
> > >
> > > > a[a == 1] <- 2
> > > Error in [<-.data.frame(*tmp*, a == 1, value = 2) :
> > >         matrix subscripts not allowed in replacement
> > >
> > > Was this a design decision?  Sorry if I missed it in
> > > An Introduction to R.
> >
> > S-PLUS differs from the original S here, as I understand it.
> > So it was an S-PLUS design decision as I understand it. S-PLUS says:
> >
> >         else if(nargs() == 3) {
> > # really ambiguous, but follow common use as if list,
> > # except when one subscript is a logical matrix the shape of x, then treat
> > # as if x were a matrix.
>
> Semantically it is a rather strange thing to do since elements
> in different columns of a data matrix can be of different type.
> And some really weird stuff *does* happen in Splus 3.4:
>
> >  a<-data.frame(a=1:10,b=factor(1:10),c=I(as.character(1:10)))
> > a[a==5]<-"x"
> Warning messages:
>   replacement values not all in levels(x): NA's generated in:
> > "[<-.factor"(.A0,
>         i[, k, drop = T], value = .A1)
> > a
>     a  b  c
>  1  1  1 1
>  2  2  2 2
>  3  3  3 3
>  4  4  4 4
>  5  x NA x
>  6  6  6 6
>  7  7  7 7
>  8  8  8 8
>  9  9  9 9
> 10 10 10 10
> > a[a==4]<-"y"
> Warning messages:
> 1: Data length is not an even multiple of group length in:
> > split(Value,
>         factor(col(i)[i], levels = seq(len = ncol(i))))
> 2: replacement values not all in levels(x): NA's generated in:
> > "[<-.factor"(.\
>         A0, i[, k, drop = T], value = .A1)
> 3: replacement values not all in levels(x): NA's generated in:
> > "[<-.factor"(.\
>         A0, i[, k, drop = T], value = .A1)
> > a
>     a  b  c
>  1  1  1 1
>  2  2  2 2
>  3  3  3 3
>  4 NA NA 4
>  5  x NA x
>  6  6  6 6
>  7  7  7 7
>  8  8  8 8
>  9  9  9 9
> 10 10 10 10
>

Your argument is well taken.

I'm preparing a package for R with accompanying data.  As I
understand data(), either of the transportable formats (.tab, .csv)
results in a data frame.  One of my data objects is probably best
modeled as a matrix, but the R applications I'm using (clustering)
are flexible, so should I just go ahead with this object as a data
frame?

On page 64 of the white book, John Chambers says "Data frames
can be treated as matrices in calls to most of the basic functions
treating arrays: subsets and elements, dim(), ..."  Perhaps he
had not contemplated the semantic problems?
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```