[R] A comment about R:
Petr Pikal
petr.pikal at precheza.cz
Fri Jan 6 18:26:06 CET 2006
Hi
just to difference between matrix and data.frame
> str(data.frame(mat))
`data.frame': 4 obs. of 5 variables:
$ X1: num -0.1940 -0.7629 0.0446 -0.5408
$ X2: num -1.092 -0.040 1.070 0.868
$ X3: num 0.634 0.823 0.693 1.152
$ X4: num 0.0258 -1.6507 1.2052 0.9714
$ X5: num 0.673 0.380 -1.531 -0.426
> str((mat))
num [1:4, 1:5] -0.1940 -0.7629 0.0446 -0.5408 -1.0925 ...
matrix is a numeric vector with dim attributes, data frame is matrix
like structure which can hold different types of variables (columns).
sd is function based on var
> sd
function (x, na.rm = FALSE)
{
if (is.matrix(x))
apply(x, 2, sd, na.rm = na.rm)
else if (is.vector(x))
sqrt(var(x, na.rm = na.rm))
else if (is.data.frame(x))
sapply(x, sd, na.rm = na.rm)
else sqrt(var(as.vector(x), na.rm = na.rm))
}
<environment: namespace:stats>
and therefore behaves in similar manner for data.frames and matrices,
but mean accepts only data.frames, numeric vectors and dates
Arguments:
x: An R object. Currently there are methods for numeric data
frames, numeric vectors and dates. A complex vector is
allowed for 'trim = 0', only.
So therefore matrix is treated as a numeric vector by mean but as a
set of vectors by sd.
Don't know why.
I believe that it is because with var(matrix) you expect output as a
variance matrix.
Maybe somebody can explain it better.
If you wanted similar behaviour for mean for matrices as sd you can
try
mymean<-function(x, na.rm=FALSE)
{
if(is.matrix(x))
colMeans(x, na.rm=na.rm)
else mean(x, na.rm=na.rm)
}
> mymean(mat)
[1] -0.3632682 0.2013843 0.8251625 0.1379205 -0.2259909
HTH
Petr
On 6 Jan 2006 at 16:18, Stefan Eichenberger wrote:
From: "Stefan Eichenberger" <Stefan.Eichenberger at se-kleve.com>
To: <r-help at stat.math.ethz.ch>
Date sent: Fri, 6 Jan 2006 16:18:16 +0100
Subject: [R] A comment about R:
> ~~~~~~~~~~~~~~~
> ... blame me for not having sent below message initially in
> plain text format. Sorry!
> ~~~~~~~~~~~~~~~
>
> I just got into R for most of the Xmas vacations and was about to ask
> for helping pointer on how to get a hold of R when I came across this
> thread. I've read through most it and would like to comment from a
> novice user point of view. I've a strong programming background but
> limited statistical experience and no knowledge on competing
> packages. I'm working as a senior engineer in electronics.
>
> Yes, the learning curve is steep. Most of the docu is extremely terse.
> Learning is mostly from examples (a wiki was proposed in another
> mail...), documentation uses no graphical elements at all. So, when it
> comes to things like xyplot in lattice: where would I get the concepts
> behind panels, superpanels, and the like?
>
> ok., this is steep and terse, but after a while I'll get over it...
> That's life. The general concept is great, things can be expressed
> very densly: Potential is here.... I quickly had 200 lines of my own
> code together, doing what it should - or so I believed.
>
> Next I did:
> matrix<-matrix(1:100, 10, 10)
> image(matrix)
> locator()
> Great: I can interactively work with my graphs... But then:
> filled.contour(matrix)
> locator()
> Oops - wrong coordinates returned. Bug. Apparently, locator() doen't
> realize that fitted.contour() has a color bar to the right and scales
> x wrongly...
>
> Here is what really shocked me:
>
> > str(bar) `data.frame': 206858 obs. of 12 variables: ...
> > str(mean(bar[,6:12]))
> Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
> ...
> > str(sd(bar[,6:12]))
> Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
> ...
> > prcomp(bar[,6:12])->foo
> > str(foo$x)
> num [1:206858, 1:7] -0.4187 -0.4015 0.0218 -0.4438 -0.3650 ... ...
> > str(mean(foo$x))
> num -1.07e-13
> > str(sd(foo$x))
> Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
> ...
>
> So, sd returns a vector independent on whether the arguement is a
> matrix or data.frame, but mean reacts differently and returns a vector
> only against a data.frame?
>
> The problem here is not that this is difficult to learn - the problem
> is the complete absense of a concept. Is a data.frame an 'extended'
> matrix with columns of different types or something different? Since
> the numeric mean (I expected a vector) is recycled nicely when used
> in a vector context, this makes debugging code close to impossible.
> Since sd returns a vector, things like mean + 4*sd vary sufficiently
> across the data elements that I assume working code... I don't get any
> warning signal that something is wrong here.
>
> The point in case is the behavior of locator() on a filled.contour()
> object: Things apparently have been programmed and debugged from
> example rather than concept.
>
> Now, in another posting I read that all this is a feature to discourge
> inexperienced users from statistics and force you to think before you
> do things. Whilst I support this concept of thinking: Did I miss
> something in statistics? I was in the believe that mean and sd were
> relatively close to each other conceptually... (here, they are even in
> different packages...)
>
> I will continue using R for the time being. But whether I can
> recommend it to my work collegues remains to be seen: How could I
> ever trust results returned?
>
> I'm still impressed by some of the efficiency, but my trust is deeply
> shaken...
> ----------------------------------------------------------------------
> - Stefan Eichenberger mailto:Stefan.Eichenberger at se-kleve.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
Petr Pikal
petr.pikal at precheza.cz
More information about the R-help
mailing list