[R] Normalization and missing values
Jonathan Baron
baron at psych.upenn.edu
Wed Apr 13 19:37:57 CEST 2005
On 04/13/05 11:36, Chris Bergstresser wrote:
Hi all --
I've got a large dataset which consists of a bunch of different
scales, and I'm preparing to perform a cluster analysis. I need to
normalize the data so I can calculate the difference matrix.
First, I didn't see a function in R which does normalization -- did
I miss it? What's the best way to do it?
Look at scale(). Might be what you mean.
Second, what's the best way to deal with missing values? Obviously,
I could just set them to 0 (the mean of the normalized scales), but I'm
not sure that's the best way.
Lots of ways to deal with missing data. The ones I've found most
helpful are in the Hmisc library, particularly transcan() and
aregImpute(). See
http://www.psych.upenn.edu/~baron/rpsych/rpsych.html#SECTION000715000000000000000
for an example of the latter. But, in general, the "right" way
to deal with missing data depends on the assumptions you make.
As a novice, I found the following article to be helpful:
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of
the state of the art. Psychological Methods, 7, 147-177.
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/
More information about the R-help
mailing list