[R] Strange data frame
McGehee, Robert
Robert.McGehee at geodecapital.com
Fri Apr 22 01:02:32 CEST 2005
Hello,
I'm playing around with the PLS package and found a data set (NIR) whose
structure I don't understand. Forgive me if this is a stupid question,
as I feel like it must be since I am less experienced with aspects of
modeling.
My problem, the pls NIR data frame does not seem to be a typical data
frame as, while it is a list, its variables are not of equal length.
Furthermore, I have no idea how to reproduce such a structure.
But, let's look at the NIR data...
> require(pls)
> data(NIR)
> class(NIR)
[1] "data.frame"
> str(NIR)
`data.frame': 28 obs. of 3 variables:
$ X : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : NULL
$ y : num 100.0 80.2 79.5 60.8 60.0 ...
$ train: logi TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE ...
> class(NIR$X)
[1] "matrix"
> class(NIR$y)
[1] "numeric"
> length(NIR$X)
[1] 7504
> length(NIR$y)
[1] 28
Ok, what this looks like to me is that NIR is a data frame (i.e. "a list
of variables of the same length with unique row names"), with a matrix
of length 7504 as one variable, and a numeric vector of length 28 as
another variable, which seems to contradict the definition of a data
frame.
Moreover, despite my best efforts, I'm unable to put any of my own data
in this structure, as the data.frame() and as.data.frame() functions
removes the matrix structure i.e.
> data.frame(y = NIR$y, X = NIR$X) ## or
> as.data.frame(list(y = NIR$y, X = NIR$X))
return a different animal altogether.
Lastly, this particular structure is useful, because the PLS authors are
able to concisely write models such as,
mvr(y ~ X, data = NIR[NIR$train, ])
instead of what I imagine would be a more complicated alternative if
they didn't have a data frame of a matrix and a vector as they do. Any
pointers to something I overlooked is appreciated.
Best,
Robert
More information about the R-help
mailing list