[R] R newbie | sapply and FUN error
egc
forum.query at gmail.com
Thu May 20 23:42:32 CEST 2010
Greetings -
While I've used R a fair bit for basic statistical machinations, I've
not used it for data manipulation - I've used SAS for 20+ years (and
SAS real shines in data handling). So, I've started the process of
trying to figure out 'how to do in R what I can do in my sleep in SAS'
- specifically wrt to data manipulating. So, these are decidely
'newbie' level questions.
So, starting very simple. Created a tine example CSV file, which I
call test.csv.
Loc,cost
A,1
C,3
D,2
F,3
H,4
K,3
M,8
Now, all I want to do is read it in, and derive a new variable which
is a Z-transform of 'cost'. Here is what I've tried so far:
> prices <- read.csv("c:/documents and settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings=".");
> print(prices$cost);
So far, so good (being able to pull in the data is a good thing).
Now, while I'm sure there are lots of ways to do what I want, I'm
going to brute force it, by calculating column mean and column SD for
'cost', generate the Z-transformed value, and then add it to the
dataframe. However, here is where I'm having problems. After about an
hour of searching, I realized I need to use an 'apply' function to
apply a function (say, mean) to a column in a dataframe. But, I can
seem to get it to work successfully (and this is the gist of the
question).
If I try
> result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
> print(result);
Works perfectly.
But, if I simply change FUN=mean to FUN=sd, not so successful:
If I try
> result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
> print(result);
Throws the following error:
Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)
Further, If I try
> result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
> print(result);
it prints 8 values corresponding to the value of each element of the
data set - meaning, its treating prices$cost as a row vector.Which
makes no sense to me. What do I have to do to use prices$cost as the
first argument in the sapply call? If I can't, why not?
is.vector(prices$cost) shows TRUE, so why can't I take the mean over
the vector?
At any rate, I'll start from here. Being able to apply functions to
column(s) of a dataframe seems pretty fundamental, so I'd like to
start by understanding the basics.
Thanks in advance.
More information about the R-help
mailing list